The idea of this exercise is to use Python for map visualization and test how good and easy it is. Here and there you can read that Python is a good choice for Data Analysis and I think it is. So let’s see how good it is for map visualization.
For this exercise we will collect the basic information of the location of every Starbucks in Korea and plot every cafe on the map. We will also try a couple of features for map visualization.
All data analysis' require a series of steps to be able to use the data effectively. Despite the main focus of this exercise is not a data analysis result or conclusion, it is still required to follow good practices like when performing data analysis.
Depending on the source, there are different approaches to data analysis but in general they include:
Depending on the objectives of a project there could be other steps like: model training, testing and so on. For the present exercise, the steps listed will be enough for map visualization.
#Import all the necessary modules.
import urllib3
import requests
import json
from pandas.io.json import json_normalize
import folium
import pandas as pd
import branca
from folium import plugins
The data of the Statbucks cafes was obtained from the official site of Starbucks Korea. For this process it is necessary to make a request to the site by area as you can see in the site. For example Seoul would have the number 1 for that request.
Once the data is requested and obtained, it is cleaned and transformed into a dataframe to use only the relevant information for the visualization. The feature engineering for this exercise is very simple; we will create a total number of cafes per district.
For the Geojson files don't forget to checkout the references below. A Geojson file is necessary to be able to plot a choropleth map.
Since our data is about all the Starbucks in the country it is necessary to make multiple requests for each one of the areas that the site divided the country.
A 'for loop' is implemeted to obtained every area number of cafes and then append each one to a final list, restulting in a list of dataframes with every cafe. Finally we take advantage of the ´for loop' to do some feature engineering on every dataframe and get the total amount of cafes per district.
#We need an URL to obtain the data we will be using.
#In this case the URL is: https://www.starbucks.co.kr/store/store_map.do?disp=locale
#We assign the URL to a variable, in this case the variable ‘url’
#We use an url per request
url = 'https://www.starbucks.co.kr/store/store_map.do?disp=locale'
url2= #...
#...
urls=[url, url2, url3, url4, url5, url6, url7, url8, url9, url10, url11, url12, url13, url14, url15, url16 ]
#Make a for loop to obtain the data for every area.
#Get the data for the cafes of a specific area
#and transform it to a Dataframe object.
#Add every dataframe to a list.
cafeslist=[]
cafeslocation=[]
for x in range(1, len(urls)+1):
url=urls[x-1]
data = {
'ins_lat':'37.56682',
'ins_lng':'126.97865',
'p_sido_cd':'{num:02d}'.format(num=x),
'p_gugun_cd':'',
'in_biz_cd':'',
'set_date':'',
'iend':'1000',
}
r = requests.post(url, data=data) #We make a request to obtain the data for every area
jo = json.loads(r.text) #The data is obtained in Json format and needs to be transformed
df = json_normalize(jo, 'list')
df = df[['s_name', 'lat', 'lot', 'sido_name', 'gugun_name', 'doro_address', 'tel']]
#Then every dataframe is appended to a list having all the cafes in this list.
cafeslocation.append(df)
#Taking advantange of the 'for loop' we can obtain the total
#number of cafes per district since every loop refers to a dataframe
#or an area.
setvalues=set(df["gugun_name"]) #we create a set with the name of every district in the area
newlist=[]
for x in df["gugun_name"]: #We append the district every time it appears in the area to a new list
newlist.append(x)
newlist1=[]
for x in newlist: #We format the list
x.strip()
newlist1.append(x)
municipalitie=[] #We count the number of cafes by each time the distict appears in the list
for x in setvalues:
y=[x,newlist1.count(x)]
municipalitie.append(y)
#A dataframe is created with the number of cafes per district
#and it is appended to a list with all the districts.
dfdistrict=pd.DataFrame(municipalitie, columns=["Area","Number of cafes"])
cafeslist.append(dfdistrict)
df.head(3)
print(municipalitie[1])
df.dtypes
cafeslist = pd.concat(cafeslist)
print(cafeslist)
Folium is library that combines python and the leaflet.js library to visualizae data on a map.
For this exercise we will plot every Starbucks on the map and use a choropleth map.
We start by plotting every Starbucks in the map with an icon. If you pass over the mouse it will show the name of the store (Yes, despite they are all Starbucks, they have a name that identifies them).
If you pass over the mouse on the icon it will show the name of the store and if you click it will show you the address
map_osm = folium.Map(location=(37.56629, 126.979808), zoom_start=8)
for x in cafeslocation:
for ix, row in x.iterrows():
location = (row['lat'], row['lot'])
folium.Marker(location, icon=folium.Icon(icon='fa-coffee', color='green', prefix='fa'), tooltip='Name: '+row['s_name'], popup='Cafe_adress:'+row['doro_address']).add_to(map_osm)
map_osm
To be able to plot a choropleth map, a GeoJSON or TopoJSON files are necessary. These formats allow to represent geographical elements, in this case in our Folium map. The GeoJSON file chosen for the exercise contains the geographical representation of Korea´s districts.
What we will be doing is to load the GeoJSON file into a variable and combine it with a Folium map with one of the methods the library contains to plot choropleth maps. We chose the data to plot, the colors to fill the districts and on what feature of our GeoJSON file we will connect the data (in this case it will be the districts)
geomap = #The path of your GeoJSON file
with open(geomap,encoding='euc-kr') as f:
json_data=json.load(f)
ac=folium.Map(location=(37.56629, 126.979808),tiles="cartodbpositron", zoom_start=8)
folium.Choropleth(
geo_data=json_data,
name="choropleth",
data=cafeslist,
columns=["Area","Number of cafes"],
key_on="feature.properties.SIG_KOR_NM",
fill_color="BuPu",
fill_opacity=.7,
line_opacity=0.2,
nan_fill_color='white',
bins=5,
legend_name="Number of Starbucks",
).add_to(ac)
ac
Although the map is plotted with each one of the districts filled with a color depending on the number of Starbucks, the result for this type of data is not very informative. Since one of the district concentrates a significant number compared to the other districts, just few districts specially in Seoul have a noticeable color compared to the rest of the country, making it hard to see a trend or draw a conclusion.
There is an error with the visualization due to the name of some districts in Korea. There are districts that share the same name both in the Data and the GeoJSON file, this means that the number of cafes of the last district with the same name in the data will be used to fill all the districts in the map with that name, regardless if it doesn’t refer to that district. It is important to know that this error is due to the data itself and not the visualization.
It is possible to improve the visualization by adding all the Starbucks to the choropleth map. This gives a better reference of the distribution of the Starbucks cafes in the country. The icon of the cafe in the firs visualization was also changed for a circle that represents the café with a radius equivalent to 200 meters of distance, which allows to see how far Starbucks cafes are away from each other in highly density areas of cafes.
geomap = #The path of your GeoJSON file
with open(geomap,encoding='euc-kr') as f:
json_data=json.load(f)
ae=folium.Map(location=(37.56629, 126.979808),tiles="cartodbpositron", zoom_start=8)
folium.Choropleth(
geo_data=json_data,
name="choropleth",
data=cafeslist,
columns=["Area","Number of cafes"],
key_on="feature.properties.SIG_KOR_NM",
fill_color="BuPu",
fill_opacity=0.8,
line_opacity=0.2,
nan_fill_color='white',
bins=4,
legend_name="Number of Starbucks",
).add_to(ae)
for x in cafeslocation:
for ix, row in x.iterrows():
folium.Circle(tuple([row['lat'], row['lot']]),
radius=200,
fillcolor='blue',
fill=True).add_to(ae)
folium.LayerControl().add_to(ae)
ae
geomap = #The path of your GeoJSON file
with open(geomap,encoding='euc-kr') as f:
json_data=json.load(f)
az=folium.Map(location=(37.56629, 126.979808), zoom_start=8)
for x in cafeslocation:
for ix, row in x.iterrows():
folium.Circle(tuple([row['lat'], row['lot']]),
radius=100,
fill=True
).add_to(az)
az
Folium also provides various external plugins that can be very useful for map visualization.
For example the plugin heatmap. Once applied to our data of Starbucks cafes, it can show the areas with more density of cafes, giving a good reference of how the cafes are distributed all over the country.
#Getting the data prepared for the heatmap.
#It consists on a list of the location coordinates of every cafe.
heatmapdata=[]
for i in cafeslocation:
for ix, row in i.iterrows():
heatmapdata.append([row['lat'], row['lot']])
print(heatmapdata[:3])
m=folium.Map(location=(37.56629, 126.879808), zoom_start=10)
# Plot it on the map
folium.plugins.HeatMap(heatmapdata, radius=20 ).add_to(m)
m
We can improve the visualization by adding every cafe (as a point) in the map and show the density clearly and how cafes are distributed.
for x in cafeslocation:
for ix, row in x.iterrows():
folium.Circle(tuple([row['lat'], row['lot']]),
radius=20,
weight=1,
color='blue',
fill=True,
).add_to(m)
m
Visualizations are useful to represent, provide and share information and data, making it easier for us to understand said data and information. Map visualizations share the same purpose but the data is represented in a geographical location, like a country, state or an area. Python as a great tool it is for data analysis it also provides a solid option for map visualizations.
The quality of the visualization will depend a lot on the quality of the data and if the visualization is correctly chosen to represent the information you want to provide. This also happens with Python. Perhaps one disadvantage of Python for map visualizations is that is not very user-friendly as other tools or software dedicated to visualizations like Tableau, after all coding is necessary to create the visualizations, there is not user interface. Nevertheless it provides the necessary options to create good map visualizations.
You can provided reliable information in map visualizations with Python.
The following links were the base and guidance for the exercise.
Sources:
https://financedata.github.io/posts/python_starbucks_map.html
https://mkjjo.github.io/python/2019/08/18/korea_population.html
The used GeoJSON file in the exercise can be found here.
https://github.com/southkorea/southkorea-maps
This one is another alternative.
https://blog.naver.com/PostView.nhn?blogId=kcchang61&logNo=221350672356